A Discriminative Candidate Generator for String Transformations

نویسندگان

  • Naoaki Okazaki
  • Yoshimasa Tsuruoka
  • Sophia Ananiadou
  • Jun'ichi Tsujii
چکیده

String transformation, which maps a source string s into its desirable form t∗, is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

دو روش تبدیل ویژگی مبتنی بر الگوریتم های ژنتیک برای کاهش خطای دسته بندی ماشین بردار پشتیبان

Discriminative methods are used for increasing pattern recognition and classification accuracy. These methods can be used as discriminant transformations applied to features or they can be used as discriminative learning algorithms for the classifiers. Usually, discriminative transformations criteria are different from the criteria of  discriminant classifiers training or  their error. In this ...

متن کامل

Comparative Study of Linear Feature Transformation Techniques for Mandarin Digit String Recognition

Linear feature transformation technique is widely used to improve feature discriminability. It can reduce the dimensionality of the feature space, un-correlate the feature components, hence more discriminative model can be obtained. In this paper we compare three discriminative linear transformation approaches in Mandarin digit string recognition (MDSR) system. Compared with the conventional Li...

متن کامل

Cryptographic potentials of quasigroup transformations

In this paper we show the potentials of string transformations by quasigroups, as a new paradigm in cryptography. To show that, we describe several algorithms that include a block cipher, a stream cipher, a hash function with variable length of output that is strongly collision free and a nonlinear pseudo random number generator. All those algorithms can be implemented using only several progra...

متن کامل

A Calculation of the plane wave string Hamiltonian from N = 4 super-Yang-Mills theory

Berenstein, Maldacena, and Nastase have proposed, as a limit of the strong form of the AdS/CFT correspondence, that string theory in a particular plane wave background is dual to a certain subset of operators in the N = 4 super-Yang-Mills theory. Even though this is a priori a strong/weak coupling duality, the matrix elements of the string theory Hamiltonian, when expressed in gauge theory vari...

متن کامل

Unbiased Random Sequences from Quasigroup String Transformations

The need of true random number generators for many purposes (ranging from applications in cryptography and stochastic simulation, to search heuristics and game playing) is increasing every day. Many sources of randomness possess the property of stationarity. However, while a biased die may be a good source of entropy, many applications require input in the form of unbiased bits, rather than bia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008